Synopsis: Phantom Files
Let's learn the challenges related to the storage of images in databases.
We'll cover the following
Imagine that a catastrophe strikes your database server. While relocating a rack full of hard drives, the rack tips over and crashes. Fortunately, no one gets hurt, but the massive hard drives shatter. Even the raised floor breaks where they fall. Fortunately, the IT department is prepared: they have made good backups of every important system every day, and they quickly deploy a new server and restore your database.
It doesn’t take long during smoke testing to notice a problem: your application associates graphic images with many database entities, but the images are missing! You call the IT technician immediately.
“We restored the database and verified that it was complete until the last backup,” the technician says. “Where were the images stored?”
You remember now that in this application, images are stored outside the database, and ordinary files are stored on the filesystem. The database stores the path to the image, and the application opens each image file at that path. “The images were stored as files. They were on the /var
filesystem, same as the databases.”
The technician shakes his head. “We don’t back up files on the /var
filesystem unless you specifically tell us to. We backup all databases, of course, but other files on /var
are usually just logs, cache data, or other tmporary files. By default, they don’t get backed up.”
Your heart sinks. There were more than images used in your product catalog database. Most of them probably still exist in other places, but tracking them all down, reformatting them, and generating their thumbnail versions for web searches will take weeks.
Objective: Store images or other bulky media#
Images and other media are used in most applications these days. Sometimes, media are associated with entities stored in the database. For example, we may allow a user to have a portrait or avatar that is displayed when they post a comment. Similarly, in our bugs database, bugs often need a screenshot to illustrate the circumstances of the defect.
The objective described in this chapter is to store images and associate them with database entities, such as user accounts or bugs. When we query these entities from the database, we need the capability to retrieve the associated images in the application.
Legitimate uses of the antipattern#
There are good reasons to store images or other large objects in files outside the database:
- The database is much leaner without images because images tend to be large compared to simple data types like integers and strings.
- Backing up the database is faster, and the result is smaller if images are not included. True, we need to copy images from the filesystem as a separate backup step, but doing this is usually more manageable than backing up a huge database.
- If images are in files external to the database, it’s easier to do ad hoc image previewing or editing. For example, if we need to apply a batch edit to all our images, it’s much better to keep images external to the database.
If these advantages of storing images in files are important and the issues described earlier are not deal-breakers, we can decide that it’s the right thing to do in this project.
Some database brands also support special SQL data types that reference external files more or less transparently. Oracle calls this data type BFILE
, while SQL Server 2008 calls it FILESTREAM
.
It would be best if we plan how our application would use images to know whether the issues described in the Antipattern lesson would affect us. We need to make an informed decision, instead of listening to generalized statements like storing images in external files is always the best solution.